<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>第 11 章 基因组学 | 小蓝哥的知识荒原</title>
<meta name="author" content="李详">
<meta name="description" content="11.1 基因家族分析 基因家族分析是一种常见的生物信息学分析套路，也是生物信息学数据挖掘发表小文章常用的分析方法，和GEO挖掘等类似。基因家族的分析鉴定可以用pfam上的hmm文件进行基因家族的检索鉴定，也可以用blast的方法进行比对鉴定，通常是适用拟南芥对应的基因家族进行比对鉴定。 植物转录因子数据库，点击访问。  11.1.1 基因家族分析思路及文章撰写思路...">
<meta name="generator" content="bookdown 0.24 with bs4_book()">
<meta property="og:title" content="第 11 章 基因组学 | 小蓝哥的知识荒原">
<meta property="og:type" content="book">
<meta property="og:image" content="https://raw.githubusercontent.com/DivadNojnarg/outstanding-shiny-ui/master/images/intro/crc-press-cover.svg">
<meta property="og:description" content="11.1 基因家族分析 基因家族分析是一种常见的生物信息学分析套路，也是生物信息学数据挖掘发表小文章常用的分析方法，和GEO挖掘等类似。基因家族的分析鉴定可以用pfam上的hmm文件进行基因家族的检索鉴定，也可以用blast的方法进行比对鉴定，通常是适用拟南芥对应的基因家族进行比对鉴定。 植物转录因子数据库，点击访问。  11.1.1 基因家族分析思路及文章撰写思路...">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="第 11 章 基因组学 | 小蓝哥的知识荒原">
<meta name="twitter:description" content="11.1 基因家族分析 基因家族分析是一种常见的生物信息学分析套路，也是生物信息学数据挖掘发表小文章常用的分析方法，和GEO挖掘等类似。基因家族的分析鉴定可以用pfam上的hmm文件进行基因家族的检索鉴定，也可以用blast的方法进行比对鉴定，通常是适用拟南芥对应的基因家族进行比对鉴定。 植物转录因子数据库，点击访问。  11.1.1 基因家族分析思路及文章撰写思路...">
<meta name="twitter:image" content="https://raw.githubusercontent.com/DivadNojnarg/outstanding-shiny-ui/master/images/intro/crc-press-cover.svg">
<!-- JS --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js" integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A==" crossorigin="anonymous"></script><script src="https://kit.fontawesome.com/6ecbd6c532.js" crossorigin="anonymous"></script><script src="libs/header-attrs-2.11/header-attrs.js"></script><script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="libs/bootstrap-4.6.0/bootstrap.min.css" rel="stylesheet">
<script src="libs/bootstrap-4.6.0/bootstrap.bundle.min.js"></script><script src="libs/bs3compat-0.3.1/transition.js"></script><script src="libs/bs3compat-0.3.1/tabs.js"></script><script src="libs/bs3compat-0.3.1/bs3compat.js"></script><link href="libs/bs4_book-1.0.0/bs4_book.css" rel="stylesheet">
<script src="libs/bs4_book-1.0.0/bs4_book.js"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js" integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js" integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww==" crossorigin="anonymous"></script><!-- CSS --><link rel="stylesheet" href="css/style.css">
</head>
<body data-spy="scroll" data-target="#toc">

<div class="container-fluid">
<div class="row">
  <header class="col-sm-12 col-lg-3 sidebar sidebar-book"><a class="sr-only sr-only-focusable" href="#content">Skip to main content</a>

    <div class="d-flex align-items-start justify-content-between">
      <h1>
        <a href="index.html" title="">小蓝哥的知识荒原</a>
      </h1>
      <button class="btn btn-outline-primary d-lg-none ml-2 mt-1" type="button" data-toggle="collapse" data-target="#main-nav" aria-expanded="true" aria-controls="main-nav"><i class="fas fa-bars"></i><span class="sr-only">Show table of contents</span></button>
    </div>

    <div id="main-nav" class="collapse-lg">
      <form role="search">
        <input id="search" class="form-control" type="search" placeholder="Search" aria-label="Search">
</form>

      <nav aria-label="Table of contents"><h2>Table of contents</h2>
        <ul class="book-toc list-unstyled">
<li><a class="" href="index.html">简介</a></li>
<li class="book-part">Part I: R</li>
<li><a class="" href="r-base.html"><span class="header-section-number">1</span> 语言基础</a></li>
<li><a class="" href="r-stat.html"><span class="header-section-number">2</span> 统计分析</a></li>
<li><a class="" href="r-vis.html"><span class="header-section-number">3</span> 数据可视化</a></li>
<li><a class="" href="r-deve.html"><span class="header-section-number">4</span> R语言开发</a></li>
<li class="book-part">Part Ⅱ: Python</li>
<li><a class="" href="python-base.html"><span class="header-section-number">5</span> Python基础</a></li>
<li><a class="" href="python-stat.html"><span class="header-section-number">6</span> 数据统计分析</a></li>
<li><a class="" href="python-spider.html"><span class="header-section-number">7</span> Python爬虫</a></li>
<li><a class="" href="ai.html"><span class="header-section-number">8</span> 人工智能</a></li>
<li class="book-part">Part Ⅲ: 生物信息学</li>
<li><a class="" href="bio-base.html"><span class="header-section-number">9</span> 基础知识</a></li>
<li><a class="" href="bio-env.html"><span class="header-section-number">10</span> 环境搭建</a></li>
<li><a class="active" href="genomics.html"><span class="header-section-number">11</span> 基因组学</a></li>
<li><a class="" href="rnaseq.html"><span class="header-section-number">12</span> RNA-Seq</a></li>
<li><a class="" href="meta.html"><span class="header-section-number">13</span> 代谢组学</a></li>
<li><a class="" href="pro.html"><span class="header-section-number">14</span> 蛋白质组</a></li>
<li><a class="" href="multi.html"><span class="header-section-number">15</span> 多组学</a></li>
<li class="book-part">Part Ⅳ: 文献阅读</li>
<li><a class="" href="patho.html"><span class="header-section-number">16</span> 植物病理学</a></li>
<li><a class="" href="liter-genomics.html"><span class="header-section-number">17</span> 基因组学</a></li>
<li><a class="" href="liter-rnaseq.html"><span class="header-section-number">18</span> RNA-Seq</a></li>
<li><a class="" href="liter-meta.html"><span class="header-section-number">19</span> 代谢组学</a></li>
<li><a class="" href="liter-pro.html"><span class="header-section-number">20</span> 蛋白质组</a></li>
<li class="book-part">Part Ⅴ: 文章发表</li>
<li><a class="" href="myarticle.html"><span class="header-section-number">21</span> 文章发表</a></li>
<li class="book-part">Part Ⅵ: 上课笔记</li>
<li><a class="" href="class.html"><span class="header-section-number">22</span> 上课笔记</a></li>
<li class="book-part">Part Ⅶ: 其他</li>
<li><a class="" href="other.html"><span class="header-section-number">23</span> 其他笔记</a></li>
<li><a class="" href="references.html">References</a></li>
</ul>

        <div class="book-extra">
          <p><a id="book-repo" href="https://github.com/lixiang117423/book4xiang">View book source <i class="fab fa-github"></i></a></p>
        </div>
      </nav>
</div>
  </header><main class="col-sm-12 col-md-9 col-lg-7" id="content"><div id="genomics" class="section level1" number="11">
<h1>
<span class="header-section-number">第 11 章</span> 基因组学<a class="anchor" aria-label="anchor" href="#genomics"><i class="fas fa-link"></i></a>
</h1>
<div id="基因家族分析" class="section level2" number="11.1">
<h2>
<span class="header-section-number">11.1</span> 基因家族分析<a class="anchor" aria-label="anchor" href="#%E5%9F%BA%E5%9B%A0%E5%AE%B6%E6%97%8F%E5%88%86%E6%9E%90"><i class="fas fa-link"></i></a>
</h2>
<p>基因家族分析是一种常见的生物信息学分析套路，也是生物信息学数据挖掘发表小文章常用的分析方法，和GEO挖掘等类似。基因家族的分析鉴定可以用<a href="http://pfam.xfam.org/">pfam</a>上的<code>hmm</code>文件进行基因家族的检索鉴定，也可以用<code>blast</code>的方法进行比对鉴定，通常是适用拟南芥对应的基因家族进行比对鉴定。
植物转录因子数据库，<a href="http://planttfdb.gao-lab.org/">点击访问</a>。</p>
<div id="基因家族分析思路及文章撰写思路" class="section level3" number="11.1.1">
<h3>
<span class="header-section-number">11.1.1</span> 基因家族分析思路及文章撰写思路<a class="anchor" aria-label="anchor" href="#%E5%9F%BA%E5%9B%A0%E5%AE%B6%E6%97%8F%E5%88%86%E6%9E%90%E6%80%9D%E8%B7%AF%E5%8F%8A%E6%96%87%E7%AB%A0%E6%92%B0%E5%86%99%E6%80%9D%E8%B7%AF"><i class="fas fa-link"></i></a>
</h3>
<p>基因家族分析是继GEO数据挖掘后，一种新的生物信息学挖掘策略。</p>
<p>如何做基因家族研究，可以参考这个帖子：<a href="http://www.planttech.com.cn/blog/58882464a46" class="uri">http://www.planttech.com.cn/blog/58882464a46</a>。</p>
<p>为什么我们要选择三七呢？</p>
<p>首先三七参考基因组还不是很多，我们实验室有一个，今年杨生超副校长有一篇最新的，这些基因组数据为我们后续做三七基因家族研究提供了坚实的基础；其次，我们实验室以三七为主，有大量的三七资源，方便我们后续验证。</p>
<p>文章撰写的格式建议以<strong><em>Frontiers in Plant Science</em></strong>为模板，后续方便修改。</p>
<p>每个基因家族都有大量的综述，因为基因家族基本都是转录因子，转录因子基本都是有很好的综述的。</p>
</div>
<div id="数据准备" class="section level3" number="11.1.2">
<h3>
<span class="header-section-number">11.1.2</span> 数据准备<a class="anchor" aria-label="anchor" href="#%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87"><i class="fas fa-link"></i></a>
</h3>
<p>需要准备的数据主要是参考基因组数据，包括<code>fasta</code>格式的序列文件、<code>gff</code>或<code>gtf</code>格式的基因组注释文件、蛋白质序列文件（通常是每个转录本的蛋白序列）、<code>cDNA</code>序列等文件。如果有转录组数据的话进行对应的转录组分析即可。除开上述这些文件外，还需要适用的文件还有<code>.hmm</code>格式的文件。</p>
</div>
<div id="软件准备" class="section level3" number="11.1.3">
<h3>
<span class="header-section-number">11.1.3</span> 软件准备<a class="anchor" aria-label="anchor" href="#%E8%BD%AF%E4%BB%B6%E5%87%86%E5%A4%87"><i class="fas fa-link"></i></a>
</h3>
<p>只需要会使用Linux系统，会安装Docker即可，然后下载<code>组学大讲堂</code>的镜像即可。<a href="https://hub.docker.com/r/omicsclass/gene-family">点击浏览</a>镜像地址。Docker的安装适用方法参考<a href="bio-env.html#WSL4Docker">10.1</a>。</p>
</div>
<div id="分析过程" class="section level3" number="11.1.4">
<h3>
<span class="header-section-number">11.1.4</span> 分析过程<a class="anchor" aria-label="anchor" href="#%E5%88%86%E6%9E%90%E8%BF%87%E7%A8%8B"><i class="fas fa-link"></i></a>
</h3>
<ul>
<li>mRNA_ID与基因ID的提取
由于一个基因会对应多个转录本，因此一个基因下会对应多个mRNA的编号。在后续的分析中，每个基因只需要选择一个转录本的编号进行分析即可，因为每个基因不同的转录本的序列差异是较小的。提取的代码：</li>
</ul>
<div class="sourceCode" id="cb153"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb153-1"><a href="genomics.html#cb153-1" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>mRNAid_to_geneid.pl data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.gff<span class="sc">*</span> results<span class="sc">/</span>step.<span class="fl">1.</span>get.mRNA.and.gene.ID<span class="sc">/</span>mRNA2geneID.txt</span>
<span id="cb153-2"><a href="genomics.html#cb153-2" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>geneid_to_mRNAid.pl data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.gff<span class="sc">*</span> results<span class="sc">/</span>step.<span class="fl">1.</span>get.mRNA.and.gene.ID<span class="sc">/</span>geneID2mRNAid.txt</span></code></pre></div>
<ul>
<li>检索结构域
这一步主要是以<code>.hmm</code>文件为基础检索该物种蛋白序列中含有该结构域的序列。输入文件包括<code>.hmm</code>文件和蛋白文件，输出<code>hmmsearch</code>的检索结果。其中用于后续筛选的是<code>evalue</code>这个参数，部分文章以0.001为阈值。<code>of</code>那一列表示的是某个基因对应的这个结构域有几个。</li>
</ul>
<div class="sourceCode" id="cb154"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb154-1"><a href="genomics.html#cb154-1" aria-hidden="true" tabindex="-1"></a>hmmsearch <span class="sc">--</span>domtblout results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>hmm.txt <span class="sc">--</span>cut_tc data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.hmm data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.pep<span class="sc">*</span></span></code></pre></div>
<ul>
<li>选择结构域
由于一个基因的单个转录本可能会比对到多个结构域，因此需要对比对到的结构域进行选择。默认选择的是第一个结构域。下面代码的最后一个参数是<code>hmmsearch</code>输出文件里面的<code>E-value</code>,如果需要全部的第一个结构域，将阈值设置为1即可。</li>
</ul>
<div class="sourceCode" id="cb155"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb155-1"><a href="genomics.html#cb155-1" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>domain_xulie.pl results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>hmm.txt data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.pep<span class="sc">*</span> results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>domain.fa <span class="fl">1.2e-28</span></span></code></pre></div>
<ul>
<li>多序列比对
之所以要进行多序列比对，是因为下载的<code>.hmm</code>文件是来自很多物种的这个结构域组成的隐马尔科夫模型，进行多序列比对后将该物种检索到的结构域序列进行比对，再次构建该物种该基因家族的隐马尔科夫模型，会更加准确。</li>
</ul>
<div class="sourceCode" id="cb156"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb156-1"><a href="genomics.html#cb156-1" aria-hidden="true" tabindex="-1"></a>echo <span class="sc">-</span>e <span class="st">'1</span><span class="sc">\n</span><span class="st">results/step.2.domain.search/domain.fa</span><span class="sc">\n</span><span class="st">2</span><span class="sc">\n</span><span class="st">1</span><span class="sc">\n</span><span class="st">results/step.2.domain.search/out.aln</span><span class="sc">\n</span><span class="st">r.domain.search/out.dnd</span><span class="sc">\n</span><span class="st">X</span><span class="sc">\n\n\n</span><span class="st">X</span><span class="sc">\n</span><span class="st">'</span> <span class="sc">|</span>clustalw</span></code></pre></div>
<ul>
<li>重构隐马尔科夫模型</li>
</ul>
<div class="sourceCode" id="cb157"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb157-1"><a href="genomics.html#cb157-1" aria-hidden="true" tabindex="-1"></a>hmmbuild results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>new.hmm results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>out.aln</span></code></pre></div>
<ul>
<li>重新进行检索
利用构建得到的新的隐马尔科夫模型重新进行检索结构域。</li>
</ul>
<div class="sourceCode" id="cb158"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb158-1"><a href="genomics.html#cb158-1" aria-hidden="true" tabindex="-1"></a>hmmsearch <span class="sc">--</span>domtblout results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>new.out.txt <span class="sc">--</span>cut_tc results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>new.hmm data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.pep<span class="sc">*</span></span></code></pre></div>
<ul>
<li>筛选输出结果
对重新检索后的结果进行筛选，也是对<code>E-value</code>进行筛选。</li>
</ul>
<div class="sourceCode" id="cb159"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb159-1"><a href="genomics.html#cb159-1" aria-hidden="true" tabindex="-1"></a>grep <span class="sc">-</span>v <span class="st">"^#"</span> results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>new.out.txt<span class="sc">|</span>awk <span class="st">'$7&lt;0.001 {print}'</span> <span class="sc">&gt;</span> results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>domain.new.out.selected.txt</span></code></pre></div>
<ul>
<li>去除重复的ID
上一步筛选得到的是该种中哪些基因是潜在的目标基因家族成员，而一个基因对应了多个mRNA，因此，只需要在筛选后的每个基因中选择一个具有代表性的mRNA进行后续的分析即可。这个提取唯一ID的步骤需要手动完成（PS：手动完成也很快）。手动挑选完mRNA的ID放在第一列，另存为文件<code>uniqueID.txt</code>。</li>
</ul>
<div class="sourceCode" id="cb160"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb160-1"><a href="genomics.html#cb160-1" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>select_redundant_mRNA.pl results<span class="sc">/</span>step.<span class="fl">1.</span>get.mRNA.and.gene.ID<span class="sc">/</span>mRNA2geneID.txt results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>domain.new.out.selected.txt results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>remove_redundant_IDlist.txt</span></code></pre></div>
<ul>
<li>提取蛋白序列
在得到基因ID后需要提取蛋白序列进行后续的分析。在<a href="http://smart.embl.de/">SMART</a>或者<a href="http://pfam.xfam.org/search">Pfam</a>或<a href="https://www.ncbi.nlm.nih.gov/cdd/">NCBI CDD</a>确认这些基因是真真正正含有该结构域，没有的基因要剔除！在<code>SMART</code>中没有检索到结构域的基因在<code>gene.not.in.SMART.txt</code>中；在<code>Pfam</code>中全都是<code>WRKY</code>结构域，对应文件为<code>Pfam.results.txt</code>。</li>
</ul>
<div class="sourceCode" id="cb161"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb161-1"><a href="genomics.html#cb161-1" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>get_fa_by_id.pl results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>uniqueID.txt data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.pep<span class="sc">*</span> results<span class="sc">/</span>step</span>
<span id="cb161-2"><a href="genomics.html#cb161-2" aria-hidden="true" tabindex="-1"></a>.<span class="fl">2.</span>domain.search<span class="sc">/</span>pep.need.confirm.fa</span></code></pre></div>
<ul>
<li>计算蛋白质分子量等</li>
</ul>
<div class="sourceCode" id="cb162"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb162-1"><a href="genomics.html#cb162-1" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>stat_protein_fa.pl results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>pep.need.confirm.fa results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>pep.MW.txt</span></code></pre></div>
<ul>
<li>构建进化树
选择利用软件<code>CLUSTALW</code>进行多序列比对，然后利用<code>MEGA</code>构建进化树。<code>CLUSTALW</code>输出结果转换成<code>.fasta</code>格式的方法参考<a href="r-deve.html#pac4xiang">4.2</a>。</li>
<li>Motif分析</li>
</ul>
<div class="sourceCode" id="cb163"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb163-1"><a href="genomics.html#cb163-1" aria-hidden="true" tabindex="-1"></a>meme results<span class="sc">/</span>step.<span class="fl">3.</span>seq.and.tree<span class="sc">/</span>pep_confirmed.fa <span class="sc">-</span>protein <span class="sc">-</span>oc results<span class="sc">/</span>step.<span class="fl">4.</span>motif<span class="sc">/</span> <span class="sc">-</span>nostatus <span class="sc">-</span>time <span class="dv">18000</span> <span class="sc">-</span>maxsize <span class="dv">6000000</span> <span class="sc">-</span>mod anr <span class="sc">-</span>nmotifs <span class="dv">10</span> <span class="sc">-</span>minw <span class="dv">6</span> <span class="sc">-</span>maxw <span class="dv">100</span></span></code></pre></div>
<ul>
<li>基因结构分析</li>
</ul>
<div class="sourceCode" id="cb164"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb164-1"><a href="genomics.html#cb164-1" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>get_gene_exon_from_gff.pl <span class="sc">-</span>in1 results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>uniqueID.txt <span class="sc">-</span>in2 data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.gff<span class="sc">*</span> <span class="sc">-</span>out results<span class="sc">/</span>step.<span class="fl">5.</span>gene.structure<span class="sc">/</span>gene_exon_info.gff</span></code></pre></div>
<ul>
<li>基因在染色体上的定位</li>
</ul>
<div class="sourceCode" id="cb165"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb165-1"><a href="genomics.html#cb165-1" aria-hidden="true" tabindex="-1"></a>samtools faidx data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.dna<span class="sc">*</span></span>
<span id="cb165-2"><a href="genomics.html#cb165-2" aria-hidden="true" tabindex="-1"></a>cp data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.fai results<span class="sc">/</span>step.<span class="fl">5.</span>gene.structure<span class="sc">/</span></span>
<span id="cb165-3"><a href="genomics.html#cb165-3" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>get_gene_weizhi.pl <span class="sc">-</span>in1 results<span class="sc">/</span>step.<span class="fl">2.</span>domain.search<span class="sc">/</span>uniqueID.txt <span class="sc">-</span>in2 data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>.gff<span class="sc">*</span> <span class="sc">-</span>out results<span class="sc">/</span>step.<span class="fl">5.</span>gene.structure<span class="sc">/</span>mrna_location.txt</span></code></pre></div>
<ul>
<li>顺式作用元件分析
脚本默认的启动子长度是1500bp。将提取得到的启动子序列上传到<a href="https://bioinformatics.psb.ugent.be/webtools/plantcare/html/">Plane CARE</a>进行分析。</li>
</ul>
<div class="sourceCode" id="cb166"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb166-1"><a href="genomics.html#cb166-1" aria-hidden="true" tabindex="-1"></a>perl code<span class="sc">/</span>script<span class="sc">/</span>get_promoter.pl data<span class="sc">/</span>unzip_data<span class="sc">/</span><span class="er">*</span>dna.top<span class="sc">*</span> results<span class="sc">/</span>step.<span class="fl">5.</span>gene.structure<span class="sc">/</span>mrna_location.txt results<span class="sc">/</span>step.<span class="fl">6.</span>cis.acting.element<span class="sc">/</span>promoter.txt</span></code></pre></div>
<ul>
<li>基因家族成员的亚细胞定位分析
两个网站：<a href="https://wolfpsort.hgc.jp/">WolfPsort</a>和<a href="http://cello.life.nctu.edu.tw/">Cello</a>。</li>
</ul>
</div>
</div>
<div id="基因组共线性分析" class="section level2" number="11.2">
<h2>
<span class="header-section-number">11.2</span> 基因组共线性分析<a class="anchor" aria-label="anchor" href="#%E5%9F%BA%E5%9B%A0%E7%BB%84%E5%85%B1%E7%BA%BF%E6%80%A7%E5%88%86%E6%9E%90"><i class="fas fa-link"></i></a>
</h2>
<p>软件JCVI：<a href="https://github.com/lixiang117423/jcvi" class="uri">https://github.com/lixiang117423/jcvi</a></p>

</div>
</div>
  <div class="chapter-nav">
<div class="prev"><a href="bio-env.html"><span class="header-section-number">10</span> 环境搭建</a></div>
<div class="next"><a href="rnaseq.html"><span class="header-section-number">12</span> RNA-Seq</a></div>
</div></main><div class="col-md-3 col-lg-2 d-none d-md-block sidebar sidebar-chapter">
    <nav id="toc" data-toggle="toc" aria-label="On this page"><h2>On this page</h2>
      <ul class="nav navbar-nav">
<li><a class="nav-link" href="#genomics"><span class="header-section-number">11</span> 基因组学</a></li>
<li>
<a class="nav-link" href="#%E5%9F%BA%E5%9B%A0%E5%AE%B6%E6%97%8F%E5%88%86%E6%9E%90"><span class="header-section-number">11.1</span> 基因家族分析</a><ul class="nav navbar-nav">
<li><a class="nav-link" href="#%E5%9F%BA%E5%9B%A0%E5%AE%B6%E6%97%8F%E5%88%86%E6%9E%90%E6%80%9D%E8%B7%AF%E5%8F%8A%E6%96%87%E7%AB%A0%E6%92%B0%E5%86%99%E6%80%9D%E8%B7%AF"><span class="header-section-number">11.1.1</span> 基因家族分析思路及文章撰写思路</a></li>
<li><a class="nav-link" href="#%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87"><span class="header-section-number">11.1.2</span> 数据准备</a></li>
<li><a class="nav-link" href="#%E8%BD%AF%E4%BB%B6%E5%87%86%E5%A4%87"><span class="header-section-number">11.1.3</span> 软件准备</a></li>
<li><a class="nav-link" href="#%E5%88%86%E6%9E%90%E8%BF%87%E7%A8%8B"><span class="header-section-number">11.1.4</span> 分析过程</a></li>
</ul>
</li>
<li><a class="nav-link" href="#%E5%9F%BA%E5%9B%A0%E7%BB%84%E5%85%B1%E7%BA%BF%E6%80%A7%E5%88%86%E6%9E%90"><span class="header-section-number">11.2</span> 基因组共线性分析</a></li>
</ul>

      <div class="book-extra">
        <ul class="list-unstyled">
<li><a id="book-source" href="https://github.com/lixiang117423/book4xiang/blob/master/11.bioinf-genomics.Rmd">View source <i class="fab fa-github"></i></a></li>
          <li><a id="book-edit" href="https://github.com/lixiang117423/book4xiang/edit/master/11.bioinf-genomics.Rmd">Edit this page <i class="fab fa-github"></i></a></li>
        </ul>
</div>
    </nav>
</div>

</div>
</div> <!-- .container -->

<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

  <div class="col-12 col-md-6 mt-3">
    <p>"<strong>小蓝哥的知识荒原</strong>" was written by 李详. It was last built on 2021年10月1日.</p>
  </div>

  <div class="col-12 col-md-6 mt-3">
    <p>This book was built by the <a class="text-light" href="https://bookdown.org">bookdown</a> R package.</p>
  </div>

</div></div>
</footer><!-- dynamically load mathjax for compatibility with self-contained --><script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    var src = "true";
    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
    if (location.protocol !== "file:")
      if (/^https?:/.test(src))
        src = src.replace(/^https?:/, '');
    script.src = src;
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script><script type="text/x-mathjax-config">const popovers = document.querySelectorAll('a.footnote-ref[data-toggle="popover"]');
for (let popover of popovers) {
  const div = document.createElement('div');
  div.setAttribute('style', 'position: absolute; top: 0, left:0; width:0, height:0, overflow: hidden; visibility: hidden;');
  div.innerHTML = popover.getAttribute('data-content');

  var has_math = div.querySelector("span.math");
  if (has_math) {
    document.body.appendChild(div);
    MathJax.Hub.Queue(["Typeset", MathJax.Hub, div]);
    MathJax.Hub.Queue(function() {
      popover.setAttribute('data-content', div.innerHTML);
      document.body.removeChild(div);
    })
  }
}
</script>
</body>
</html>
